Data from astronomical observations is growing at a much faster rate than ever before, resulting in datasets with unprecedented size and dimensionality. For example, the modern sky survey missions such as Kepler make a large number of measurements (photometric, orbital, and spatial) for a wide range of stars. Because the ability to detect anomalous in these high-dimensional datasets is important for identifying rare astrophysical occurrences and distinguishing them from noise or artefacts from sensors, it is essential to be able to detect anomalies (i.e. data that are different than the others) in these datasets. Many traditional approaches to anomaly detection, including rules-based approaches and single-model machine learning methods, do not work reliably for detecting anomalies because they are sensitive to errors in the data caused by noise and often rely on the assumption that the data has a particular distribution of values. To address this problem, this paper presents a new framework for multi-model anomaly detection using an agreement-based approach that incorporates six anomaly detection algorithms: Isolation Forest, One-Class Support Vector Machine, Local Outlier Factor, DBSCAN, Elliptic Envelope, and Random Forest. Each of the models will independently identify potential anomalies, and an agreement scoring mechanism will produce a final classification based upon the predictions provided by each of the models.
The proposed framework was evaluated against a Kepler-like dataset that contained 50,000 rows of data with 25 characteristics which represented physical stellar, photometric, orbital, and spatial properties of the stars. The results of the evaluations show that the overall F1 score of the proposed framework was 0.94, suggesting an improvement over any one of the developed anomaly detection models when evaluated on this dataset. The framework for the proposed multi-model anomaly detection mechanism has been incorporated into a web-based platform called AstroVision, which includes interactive PCA and t-SNE visualizations, automated reporting, and integration with astronomical data pipelines.
Introduction
The text discusses the challenges of detecting anomalies in large-scale, high-dimensional astronomical datasets and proposes a robust solution called AstroVision. Modern missions like Kepler, TESS, and Gaia generate massive datasets with stellar, photometric, orbital, and spatial measurements, making manual analysis impractical. Anomalies in these datasets can indicate rare astrophysical phenomena or artifacts, but traditional threshold- or statistical-based methods struggle with high dimensionality and complex feature interactions, often producing high false positives or negatives.
Key points of the proposed approach:
Consensus-based multi-model anomaly detection: Integrates multiple machine learning algorithms (e.g., Isolation Forest, One-Class SVM, LOF) to reduce model-specific bias and capture diverse types of anomalies.
Web-based platform (AstroVision): Enables interactive analysis, visualization using PCA and t-SNE, and automated reporting.
Dataset and preprocessing: Uses a Kepler-inspired dataset with 50,000 observations and 25 features spanning stellar, photometric, orbital, and spatial attributes. Data are standardized, cleaned, and extreme values capped. False positives are treated as anomalies for evaluation.
Performance: Ensemble approach improves robustness and achieves higher F1 scores than individual models while providing interpretable insights.
Summary: AstroVision provides a scalable, interactive framework for robust anomaly detection in high-dimensional astronomical data by combining multiple machine learning models, enhancing detection of rare astrophysical events while mitigating limitations of single-model approaches.
Conclusion
This study presents a consensus-based multi-model anomaly detection framework for high-dimensional astronomical datasets. And Random Forest—into a unified consensus scoring mechanism., Elliptic Envelope, DBSCAN, Local Outlier Factor, One-Class Support Vector Machine, the proposed approach integrates six anomaly detection algorithms that\'s Isolation Forest
Experimental evaluation on a 50,000 observation Kepler-inspired. Dataset demonstrates that the consensus framework achieves an F1-score of 0. 94, outperforming all individual anomaly detection models. High-dimensional datasets., these results indicate that combining models with. Complementary inductive biases improves anomaly detection reliability in complex, On that note
The framework is implemented within the AstroVision web platform, which provides interactive. 3D visualizations using PCA and t-SNE, along with automated report generation. By integrating machine learning techniques with an accessible web interface, the platform. Enables users to perform anomaly detection without requiring extensive technical expertise.
The findings suggest that consensus-based ensemble learning offers a practical and scalable solution for anomaly detection. In large-scale astronomical surveys and may also be applicable to other domains involving high-dimensional observational data.
References
[1] N. M. Batalha et al., \"Planetary candidates observed by Kepler. III. Analysis of the first 16 months of data,\" Astrophys. J. Suppl., vol. 204, no. 2, p. 24, Feb. 2013.
[2] J. L. Coughlin et al., \"Contamination in the Kepler field: Astrophysical false positives from ground-based follow-up observations,\" Astron. J., vol. 147, no. 5, p. 119, May 2014.
[3] M. Pruzhinskaya et al., \"Anomaly detection in the Open Supernova Catalog,\" Mon. Not. R. Astron. Soc., vol. 489, no. 3, pp. 3591–3601, Oct. 2019.
[4] K. Masci et al., \"The Zwicky Transient Facility: Data processing, products and archive,\" Publ. Astron. Soc. Pac., vol. 131, no. 995, p. 018003, Jan. 2019.
[5] C. J. Shallue and A. Vanderburg, \"Identifying exoplanets with deep learning,\" Astron. J., vol. 155, no. 2, p. 94, Feb. 2018.
[6] F. T. Liu, K. M. Ting, and Z.-H. Zhou, \"Isolation forest,\" in Proc. IEEE ICDM, 2008, pp. 413–422.
[7] M. M. Breunig et al., \"LOF: Identifying density-based local outliers,\" in Proc. ACM SIGMOD, 2000, pp. 93–104.
[8] B. Schölkopf et al., \"Estimating the support of a high-dimensional distribution,\" Neural Comput., vol. 13, no. 7, pp. 1443–1471, 2001.
[9] M. Ester et al., \"A density-based algorithm for discovering clusters in large spatial databases with noise,\" in Proc. KDD, 1996, pp. 226–231.
[10] P. J. Rousseeuw and K. V. Driessen, \"A fast algorithm for the minimum covariance determinant estimator,\" Technometrics, vol. 41, no. 3, pp. 212–223, 1999.
[11] L. Breiman, \"Random forests,\" Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.
[12] T. G. Dietterich, \"Ensemble methods in machine learning,\" in Proc. MCS, 2000, pp. 1–15.
[13] A. Lazarevic and V. Kumar, \"Feature bagging for outlier detection,\" in Proc. KDD, 2005, pp. 157–166.
[14] F. Keller et al., \"HiCS: High contrast subspaces for density-based outlier ranking,\" in Proc. ICDE, 2012, pp. 1037–1048.
[15] V. Chandola, A. Banerjee, and V. Kumar, \"Anomaly detection: A survey,\" ACM Comput. Surveys, vol. 41, no. 3, pp. 1–58, 2009.
[16] M. Goldstein and S. Uchida, \"A comparative evaluation of unsupervised anomaly detection algorithms,\" Neurocomputing, vol. 72, pp. 224–245, 2016.
[17] E. Eskin, \"Anomaly detection over noisy data using learned probability distributions,\" in Proc. ICML, 2000.
[18] S. Aggarwal, \"Outlier Analysis,\" Springer, 2017.
[19] H. Song, M. Kim, and J. Lee, \"Robust anomaly detection using ensemble techniques,\" Expert Systems with Applications, vol. 42, no. 9, pp. 1–10, 2015.
[20] S. Rawat, \"Kepler Exoplanet Dataset,\" Kaggle, 2026. [Online]. Available: https://www.kaggle.com/datasets/sneharawat080/kelper-exoplanet-dataset